Google distance between words

نویسندگان

  • Bjørn Kjos-Hanssen
  • Alberto J. Evangelista
چکیده

Rudi Cilibrasi and Paul Vitanyi have demonstrated that it is possible to extract the meaning of words from the world-wide-web. To achieve this, they rely on the number of webpages that are found through a Google search containing a given word and they associate the page count to the probability that the word appears on a webpage. Thus, conditional probabilities allow them to correlate one word with another word’s meaning. Furthermore, they have developed a distance function that gauges how closely related a pair of words is. We intend to review Cilibrasi and Vitanyi’s work. In particular, we aim to improve their distance function through elimination of random data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Soundex-based Translation Correction in Urdu–English Cross-Language Information Retrieval

Cross-language information retrieval is difficult for languages with few processing tools or resources such as Urdu. An easy way of translating content words is provided by Google Translate, but due to lexicon limitations named entities (NEs) are transliterated letter by letter. The resulting NEs errors (zynydyny zdn for Zinedine Zidane) hurts retrieval. We propose to replace English non-words ...

متن کامل

Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings

Non-compositional multiword expressions (MWEs) still pose serious issues for a variety of natural language processing tasks and their ubiquity makes it impossible to get around methods which automatically identify these kind of MWEs. The method presented in this paper was inspired by Sporleder and Li (2009) and is able to discriminate between the literal and non-literal use of an MWE in an unsu...

متن کامل

Detection and Correction of Malapropisms in Spanish by Means of Internet Search

Malapropisms are real-word errors that lead to syntactically correct but semantically implausible text. We report an experiment on detection and correction of Spanish malapropisms. Malapropos words semantically destroy collocations (syntactically connected word pairs) they are in. Thus we detect possible malapropisms as words that do not form semantically plausible collocations with neighboring...

متن کامل

Unsupervised Japanese-Chinese Opinion Word Translation using Dependency Distance and Feature-Opinion Association Weight

Online shoppers depend on customer reviews when evaluating products or services. However, in the international online marketplace, reviews in a user’s language may not be available. Translation of online customer reviews is therefore an important service. A crucial aspect of this task is translating opinion words, key words that capture the reviewers’ sentiments. This is challenging because opi...

متن کامل

An Analysis of Heavy Metals Quantity Especially Pb, Cr and Cd in Grape and Various Leaves Types of Vitis Vinifera L. Harvested in Malekan Based on the Distance From the Road

Providing healthy food and protecting sources from pollution has been one of the concerns of human societies and decision – making centers so that protecting food from pollution, detecting sources of pollution and measuring them become important. Because of nutritive and political significance of grape in this area, extensive use of leaf and fruit of this plant, developing urban areas around gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0901.4180  شماره 

صفحات  -

تاریخ انتشار 2006